Introduction to grammar of graphics

Erik Fredner

2024-08-24

Why visualize data well?

What are key considerations for visualizing data well?

  • Keep it simple! (KISS principle)
  • Don’t cherry-pick
    • Represent data truthfully
  • Make accessible visualizations
    • e.g., color palettes like viridis work for colorblind people

What is the grammar of graphics?

Origin

Key gg concepts

Data visualizations (simple or complex) are composed of layers. Each layer consists of three parts:

Key Description
data Tabular dataset associated with the layer
geom Graphical element associated with each observation
aes Mappings from properties of the plot that associate features in the dataset with elements of the geometry

Example data set: food

food <- read_csv("../data/food.csv")

food |>
  select(item, food_group, calories, carbs)
# A tibble: 61 × 4
   item        food_group calories carbs
   <chr>       <chr>         <dbl> <dbl>
 1 Apple       fruit            52 13.8 
 2 Asparagus   vegetable        20  3.88
 3 Avocado     fruit           160  8.53
 4 Banana      fruit            89 22.8 
 5 Chickpea    grains          180 30.0 
 6 String Bean vegetable        31  7.13
 7 Beef        meat            288  0   
 8 Bell Pepper vegetable        26  6.03
 9 Crab        fish             87  0.04
10 Broccoli    vegetable        34  6.64
# ℹ 51 more rows

Example scatter plot

Observations represented by dots:

food <- read_csv("../data/food.csv")

food |>
  ggplot() +
  geom_point(aes(x = calories, y = carbs))

Example text plot

Observations represented by the item label:

food |>
  ggplot() +
  geom_text(aes(x = calories, y = carbs, label = item))

Example bar plot (more complex)

Observations represented by bars:

Code
food |>
  # filter for high cholesterol foods:
  filter(cholesterol > 50) |>
  ggplot() +
  # set nice x axis label:
  xlab("Food") +
  # set nice y axis label:
  ylab("Cholesterol (mg)") +
  # set bar chart
  geom_col(aes(
    # sort bars by descending amount of cholesterol
    x = reorder(item, -cholesterol),
    y = cholesterol,
    # set bar color (fill) by food_group
    fill = food_group
  )) +
  # tilt labels for readability:
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Syntax review

With ggplot, you can combine multiple layers to create simple or complex data visualizations. In general terms, the structure is:

data |>
  ggplot() +
  geom_...(aes(x = ..., y = ...)) +
  ...

Fixed aesthetics in ggplot

Why hard-code aesthetics?

  • Sometimes you want to set aesthetics that are not tied to the data.
  • For example, you might want to set the color of all points to be green.
  • Or you might want to set the size of all points to be huge.

Example scatter plot with fixed aesthetics

food |>
  ggplot() +
  # note that the color green goes outside of aes():
  geom_point(aes(x = calories, y = carbs, size = 10), color = "green")